637126527303023264OL.jpg

Analyzing the Prices of Boston Airbnb Rentals: What Affects Prices and Have Prices Changed Since the Pandemic?

By Josh Lavitz and Max Nguyen

Note that we hyperlink to additional resources throughout this tutorial that may be useful in explaining terms and things more thoroughly!

**Introduction**

Airbnb is a company that provides short-term rentals for people to stay in. Airbnb has helped host over 800 million people, with 5.6 million currently active listings over 220 regions around the world. Even despite the challenges to travel and tourism posed by the COVID-19 pandemic, the company has still managed to adapt, continuing to make a profit of $219 million. As work transitions to be remote, people are using Airbnbs to get away from home and do their work from anywhere. Some people are also using Airbnb's as temporary places to socially distance or quarantine.

In this tutorial, we will explore what affects the prices of Airbnb rentals and how Airbnb prices have changed since the pandemic, if at all. The cost of rentals is a major consideration for people, and economic considerations are perhaps even more important during these times. Specifically, we will be looking at data for rentals in Boston, which is a major city that was recently ranked as one of the three best largest cities in America. We will be comparing the data of listings from October 2020 to the listings from October 2019, as October 2019 was before the pandemic struck the U.S. This data exploration will hopefully be of interest to anyone looking to stay in an Airbnb during this time or in the future, once the circumstances are better.

Python Libraries

In this tutorial, we use Python 3 and a few helpful Python libraries. We list and briefly describe the primary ones below:

  • Pandas - It provides functions to read data from different file formats into a DataFrame object, which then allows for easy data manipulation.
  • Folium - We use this to create interactive data maps.
  • Plotly - This library provides a lot of functions to make all sorts of graphs like line plots, scatter plots, etc. Most of the plots made with this library will show more detailed information by hovering over it!
  • Statsmodel
  • Requests
  • Json
In [13]:
import warnings
warnings.filterwarnings('ignore')

**Data Collection**

Listings Data

Our data comes from Inside Airbnb, which scrapes data from the public listings on the Airbnb website. Since Inside Airbnb has already done the web scraping for us to produce a dataset of listing, we do not have to do much complicated work. As previously mentioned, we are looking at the rentals for Boston, the capital city of Massachusetts. To compare how prices have changed since the pandemic has started, we will compare the data set that was scraped from Airbnb on October 24, 2020 to the data scraped in October 19, 2019, which is almost exactly a year apart.

From the CSV files for October 2020 and October 2019 provided by Inside Airbnb, we read the data into two separate Pandas dataframes (which are like tables) so that the data can be easily accessed and manipulated.

In [14]:
import pandas # Imports the pandas library for use
In [15]:
# Uses the built-in function to read in the 2020 data from the CSV file
df_2020 = pandas.read_csv("https://raw.githubusercontent.com/joshlavitz/joshlavitz.github.io/main/listings.csv")

df_2020 # Displays the dataframe
Out[15]:
id listing_url scrape_id last_scraped name description neighborhood_overview picture_url host_id host_url ... review_scores_communication review_scores_location review_scores_value license instant_bookable calculated_host_listings_count calculated_host_listings_count_entire_homes calculated_host_listings_count_private_rooms calculated_host_listings_count_shared_rooms reviews_per_month
0 3781 https://www.airbnb.com/rooms/3781 20201024170420 2020-10-24 HARBORSIDE-Walk to subway Fully separate apartment in a two apartment bu... Mostly quiet ( no loud music, no crowed sidewa... https://a0.muscache.com/pictures/24670/b2de044... 4804 https://www.airbnb.com/users/show/4804 ... 10.0 10.0 10.0 NaN f 1 1 0 0 0.26
1 5506 https://www.airbnb.com/rooms/5506 20201024170420 2020-10-24 **$49 Special ** Private! Minutes to center! Private guest room with private bath, You do n... Peacful, Architecturally interesting, historic... https://a0.muscache.com/pictures/1598e8b6-5a55... 8229 https://www.airbnb.com/users/show/8229 ... 10.0 9.0 10.0 Exempt: This listing is a unit that has contra... f 6 6 0 0 0.76
2 6695 https://www.airbnb.com/rooms/6695 20201024170420 2020-10-24 $99 Special!! Home Away! Condo Comfortable, Fully Equipped private apartment... Peaceful, Architecturally interesting, histori... https://a0.muscache.com/pictures/38ac4797-e7a4... 8229 https://www.airbnb.com/users/show/8229 ... 10.0 9.0 10.0 STR-404620 f 6 6 0 0 0.84
3 10730 https://www.airbnb.com/rooms/10730 20201024170420 2020-10-24 Bright 1bed facing Golden Dome Bright, spacious unit, new galley kitchen, new... Beacon Hill is located downtown and is conveni... https://a0.muscache.com/pictures/miso/Hosting-... 26988 https://www.airbnb.com/users/show/26988 ... 10.0 10.0 9.0 NaN f 7 7 0 0 0.24
4 10813 https://www.airbnb.com/rooms/10813 20201024170420 2020-10-24 Back Bay Apt-blocks to subway, Newbury St, The... Stunning Back Bay furnished studio apartment. ... Wander around this quintessential neighborhood... https://a0.muscache.com/pictures/20b5b9c9-e1f4... 38997 https://www.airbnb.com/users/show/38997 ... 10.0 10.0 10.0 NaN f 11 11 0 0 0.94
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3249 46021420 https://www.airbnb.com/rooms/46021420 20201024170420 2020-10-24 Stunning 1BR in Downtown + 100 WalkScore | Evo... Whether you are just getting away for the week... Downtown’s Theater District bustles with energ... https://a0.muscache.com/pictures/956c6254-61ea... 212359760 https://www.airbnb.com/users/show/212359760 ... NaN NaN NaN Exempt: This listing is a unit used for furnis... t 43 43 0 0 NaN
3250 46021809 https://www.airbnb.com/rooms/46021809 20201024170420 2020-10-24 Spacious and Modern 2BD in the Heart of Boston This is a modern 2 bed in The Heart of Boston<... NaN https://a0.muscache.com/pictures/70f28a8d-36e0... 2356643 https://www.airbnb.com/users/show/2356643 ... NaN NaN NaN NaN t 11 11 0 0 NaN
3251 46022872 https://www.airbnb.com/rooms/46022872 20201024170420 2020-10-24 Room in Large Brookline House, Phenomenal Loca... Room A in 7 Bed, 3 Bath<br />Extremely spaciou... Just off Harvard Ave, connecting Packards Corn... https://a0.muscache.com/pictures/2114bef5-443a... 373050156 https://www.airbnb.com/users/show/373050156 ... NaN NaN NaN NaN t 2 0 2 0 NaN
3252 46024344 https://www.airbnb.com/rooms/46024344 20201024170420 2020-10-24 Furnished Room, Big Brookline House, Top Location Room C in 7 Bed, 3 Bath apartment<br />Extreme... Just off Harvard Ave, connecting Packards Corn... https://a0.muscache.com/pictures/2114bef5-443a... 373050156 https://www.airbnb.com/users/show/373050156 ... NaN NaN NaN NaN t 2 0 2 0 NaN
3253 46025053 https://www.airbnb.com/rooms/46025053 20201024170420 2020-10-24 A place of your own | Studio in Boston Stay for 30+ nights (minimum nights and rates ... NaN https://a0.muscache.com/pictures/8860911a-df51... 359229620 https://www.airbnb.com/users/show/359229620 ... NaN NaN NaN NaN t 177 177 0 0 NaN

3254 rows × 74 columns

In [16]:
# Uses the built-in function to read in the 2019 data from the CSV file
df_2019 = pandas.read_csv("https://raw.githubusercontent.com/joshlavitz/joshlavitz.github.io/main/listings2019.csv")

df_2019 # Displays the dataframe
Out[16]:
id listing_url scrape_id last_scraped name summary space description experiences_offered neighborhood_overview ... instant_bookable is_business_travel_ready cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count calculated_host_listings_count_entire_homes calculated_host_listings_count_private_rooms calculated_host_listings_count_shared_rooms reviews_per_month
0 3781 https://www.airbnb.com/rooms/3781 20191018230017 2019-10-19 HARBORSIDE-Walk to subway Fully separate apartment in a two apartment bu... This is a totally separate apartment located o... Fully separate apartment in a two apartment bu... none Mostly quiet ( no loud music, no crowed sidewa... ... f f super_strict_30 f f 2 2 0 0 0.29
1 5506 https://www.airbnb.com/rooms/5506 20191018230017 2019-10-19 **$99 Special ** Private! Minutes to center! Private guest room with private bath, You do n... **THE BEST Value in BOSTON!!*** PRIVATE GUEST ... Private guest room with private bath, You do n... none Peacful, Architecturally interesting, historic... ... t f strict_14_with_grace_period f f 6 6 0 0 0.80
2 6695 https://www.airbnb.com/rooms/6695 20191018230017 2019-10-19 $99 Special!! Home Away! Condo NaN ** WELCOME *** FULL PRIVATE APARTMENT In a His... ** WELCOME *** FULL PRIVATE APARTMENT In a His... none Peaceful, Architecturally interesting, histori... ... t f strict_14_with_grace_period f f 6 6 0 0 0.89
3 6976 https://www.airbnb.com/rooms/6976 20191018230017 2019-10-19 Mexican Folk Art Showcase in Boston Neighborhood Come stay with me in Boston's Roslindale neigh... This is a well-maintained, two-family house bu... Come stay with me in Boston's Roslindale neigh... none The LOCATION: Roslindale is a safe and diverse... ... f f moderate t f 1 0 1 0 0.66
4 8789 https://www.airbnb.com/rooms/8789 20191018230017 2019-10-18 Curved Glass Studio/1bd facing Park Bright, 1 bed with curved glass windows facing... Fully Furnished studio with enclosed bedroom. ... Bright, 1 bed with curved glass windows facing... none Beacon Hill is a historic neighborhood filled ... ... f f strict_14_with_grace_period f f 10 10 0 0 0.38
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5642 39461104 https://www.airbnb.com/rooms/39461104 20191018230017 2019-10-19 Convenient North End Studio w/ W/D + Gym near ... Show up and start living from day one in Bosto... Gorgeous furniture, fully-equipped kitchen, sm... Show up and start living from day one in Bosto... none This furnished apartment is located in the Nor... ... t f flexible f f 92 92 0 0 NaN
5643 39461138 https://www.airbnb.com/rooms/39461138 20191018230017 2019-10-19 Equipped North End Studio w/ W/D (BOS128) Show up and start living from day one in Bosto... Gorgeous furniture, fully-equipped kitchen, sm... Show up and start living from day one in Bosto... none This furnished apartment is located in the Nor... ... t f flexible f f 92 92 0 0 NaN
5644 39461190 https://www.airbnb.com/rooms/39461190 20191018230017 2019-10-19 Comfy North End Studio w/ Doorman + W/D near T... Show up and start living from day one in Bosto... Thoughtfully designed with bespoke finishes, m... Show up and start living from day one in Bosto... none This furnished apartment is located in the Nor... ... t f flexible f f 92 92 0 0 NaN
5645 39461223 https://www.airbnb.com/rooms/39461223 20191018230017 2019-10-19 Bespoke North End Studio w/ Gym + W/D near Nor... Discover the best of Boston, with this studio ... Thoughtfully designed with bespoke finishes, m... Discover the best of Boston, with this studio ... none This furnished apartment is located in the Nor... ... t f flexible f f 92 92 0 0 NaN
5646 39462969 https://www.airbnb.com/rooms/39462969 20191018230017 2019-10-19 Your Home in Back Bay! Located on the corner of Gloucester & Newbury ... The apartment is on the third floor - and ther... Located on the corner of Gloucester & Newbury ... none The neighborhood is just fantastic! Five minut... ... f f flexible f f 1 1 0 0 NaN

5647 rows × 106 columns

From the displayed dataframes, we can see that there were 3,254 active listings in Boston on October 24, 2020 and 5,647 active listings on October 19, 2019. This means that there were about 2,000 less listings in October 2020, which makes sense considering the challenges with hosting guests in a rental during a pandemic.

Each row corresponds to the information for a specific listing. There are a lot of different columns, some with very long names, and we will not be using all of them for our analysis. Some of the notable columns include:

  • id - The unique identifier for each rental, which corresponds to the URL for the rental.
  • latitude - The latitude coordinates.
  • longitude - The longitude coordinates.
  • neighborhood_cleansed - The neighborhood of Boston that the rental is located in.
  • room_type - Whether the rental is the entire home or apartment, a private room, a shared room, or a hotel room.
  • number_of_reviews - The total # of reviews that the rental has received from guests.
  • review_scores_rating - The aggregate score out of 100 the rental has from all of the reviews it has received.
  • amenities - A list of the amenities provided to guests like TV, Shampoo, Wi-Fi, etc.
  • price - The cost to stay in the rental in $ per night. This variable is the focus of our investigation!

GeoJSON Data

Inside Airbnb also provides a GeoJSON file, which we can later use to produce visualize the specific neighborhoods of Boston.

In [17]:
import json
import requests

# Using the requests library, we can get the data and then parse it as a json file with the json library
url = 'https://raw.githubusercontent.com/joshlavitz/joshlavitz.github.io/main/neighbourhoods.geojson'
county_geo = requests.get(url).json()

Now that we have read our data from the CSV files into two Pandas dataframe, we have to do some cleaning and curation, which will make our later analysis much easier.

**Data Cleaning and Curation**

Dropping Unnecessary Columns

First, we drop the unncessary columns in the 2020 dataframe that contain information that we are not using in our analysis. The columns we keep are: neighbourhood_cleansed, latitude, longitude, room_type, and price since we are most interested in the effects that the location and room type have on the price of listings. A lot of the columns concern things like host information or the date of the first review on the listing, which are not relevant to our analysis. We also shorten the 'neighbourhood_cleansed' column to simply 'neighbourhood'.

In [18]:
df_2020 = df_2020[['id', 'neighbourhood_cleansed', 
                   'latitude', 'longitude', 
                   'room_type', 'number_of_reviews',
                   'review_scores_rating',
                   'amenities', 'price']]

# Renames the columns to have a shorter name
df_2020.rename(columns={'neighbourhood_cleansed':'neighborhood',
                        'number_of_reviews':'num_reviews',
                        'review_scores_rating':'rating'}, inplace=True) 

df_2020
Out[18]:
id neighborhood latitude longitude room_type num_reviews rating amenities price
0 3781 East Boston 42.364130 -71.029910 Entire home/apt 17 99.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV", "... $150.00
1 5506 Roxbury 42.329810 -71.095590 Entire home/apt 107 95.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV", "... $145.00
2 6695 Roxbury 42.329940 -71.093510 Entire home/apt 115 96.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV", "... $169.00
3 10730 Downtown 42.358400 -71.061850 Entire home/apt 32 96.0 ["Cable TV", "Smoke alarm", "TV", "Bed linens"... $81.00
4 10813 Back Bay 42.350610 -71.087870 Entire home/apt 10 99.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV", "... $87.00
... ... ... ... ... ... ... ... ... ...
3249 46021420 Beacon Hill 42.353290 -71.065380 Entire home/apt 0 NaN ["Shower gel", "Shampoo", "Smoke alarm", "TV",... $239.00
3250 46021809 Roxbury 42.330500 -71.071270 Entire home/apt 0 NaN ["Air conditioning", "Heating", "Laptop-friend... $47.00
3251 46022872 Allston 42.347372 -71.130569 Private room 0 NaN ["Hangers", "Heating", "Laptop-friendly worksp... $44.00
3252 46024344 Allston 42.348080 -71.129930 Private room 0 NaN ["Hangers", "Heating", "Laptop-friendly worksp... $44.00
3253 46025053 East Boston 42.371010 -71.043770 Entire home/apt 0 NaN ["BBQ grill", "Shampoo", "Smoke alarm", "TV", ... $147.00

3254 rows × 9 columns

Our dataframe is much more readable and manageable now that we have removed the less relevant columns! Now we can focus on the variables that we will actually be analyzing.

Combining the Dataframes into One

Now, we want to merge the two separate dataframes from 2020 and 2019 into one dataframe so that we can have columns showing the rental's price in 2019 and in 2020 in a single dataframe.

We will be doing an inner join, so we will only be keeping the rentals with the same unique id that were active in both 2019 and 2020 and excluding rentals that were only active in 2019 or only active in 2020. That way, we can easily compare how a rental changed its price from 2019 to 2020.

In [19]:
# Drops unnecessary columns from the 2019 dataframe since we are mainly concerned about adding the 2019 prices as a column
df_2019 = df_2019[['id', 'price']] 

# Merges based on rentals with the same id
df = df_2020.merge(df_2019, how="inner", right_on=['id'], left_on=['id'])

# Renames the columns appropriately
df.rename(columns={'price_x':'price_2020', 'price_y':'price_2019'}, inplace=True)

df
Out[19]:
id neighborhood latitude longitude room_type num_reviews rating amenities price_2020 price_2019
0 3781 East Boston 42.36413 -71.02991 Entire home/apt 17 99.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV", "... $150.00 $125.00
1 5506 Roxbury 42.32981 -71.09559 Entire home/apt 107 95.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV", "... $145.00 $145.00
2 6695 Roxbury 42.32994 -71.09351 Entire home/apt 115 96.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV", "... $169.00 $169.00
3 10730 Downtown 42.35840 -71.06185 Entire home/apt 32 96.0 ["Cable TV", "Smoke alarm", "TV", "Bed linens"... $81.00 $150.00
4 10813 Back Bay 42.35061 -71.08787 Entire home/apt 10 99.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV", "... $87.00 $179.00
... ... ... ... ... ... ... ... ... ... ...
2053 39445807 Back Bay 42.34645 -71.07803 Entire home/apt 2 100.0 ["Shampoo", "Smoke alarm", "TV", "Bed linens",... $125.00 $200.00
2054 39446774 Back Bay 42.34663 -71.07915 Entire home/apt 1 100.0 ["Shower gel", "Cable TV", "Shampoo", "Smoke a... $148.00 $245.00
2055 39447297 Back Bay 42.34635 -71.07792 Entire home/apt 0 NaN ["Garden or backyard", "Shampoo", "Smoke alarm... $148.00 $245.00
2056 39447462 Back Bay 42.34603 -71.07920 Entire home/apt 0 NaN ["Shampoo", "Smoke alarm", "TV", "Private entr... $148.00 $245.00
2057 39447565 Back Bay 42.34834 -71.08152 Entire home/apt 1 100.0 ["Shampoo", "Smoke alarm", "TV", "Baking sheet... $148.00 $245.00

2058 rows × 10 columns

We now have a combined dataframe showing the price for each rental in 2019 and in 2020. We can see that about 1,542 rentals were active in both 2019 and 2020.

Converting Price Column to Int

We convert the price columns into type int, so that they can be treated as numbers. Currently, the prices are written as strings like '$1,000' so we remove the dollar sign and comma to result in '1000' as an example.

In [20]:
# Iterates over every row
for index, row in df.iterrows():
    price_2020 = float(row['price_2020'][1:].replace(',','')) # Removes the $ in the beginning of the price 
    price_2019 = float(row['price_2019'][1:].replace(',',''))

    df.at[index, 'price_2020'] = price_2020
    df.at[index, 'price_2019'] = price_2019

df['price_2020'] = df['price_2020'].astype(int) # Ensures the column of prices are treated like floats
df['price_2019'] = df['price_2019'].astype(int) # Ensures the column of prices are treated like floats


df.head() # Displays the first 5 listings
Out[20]:
id neighborhood latitude longitude room_type num_reviews rating amenities price_2020 price_2019
0 3781 East Boston 42.36413 -71.02991 Entire home/apt 17 99.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV", "... 150 125
1 5506 Roxbury 42.32981 -71.09559 Entire home/apt 107 95.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV", "... 145 145
2 6695 Roxbury 42.32994 -71.09351 Entire home/apt 115 96.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV", "... 169 169
3 10730 Downtown 42.35840 -71.06185 Entire home/apt 32 96.0 ["Cable TV", "Smoke alarm", "TV", "Bed linens"... 81 150
4 10813 Back Bay 42.35061 -71.08787 Entire home/apt 10 99.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV", "... 87 179

The price columns are now properly encoded as integers! Note, that all of the prices were whole numbers, which is why we did not have to worry about converting to type float.

Adding a Column for Number of Amenities

We want to do some analysis to see if the number of amenities provided by a rental affects the price. Since the dataset has a column listing the amenities provided, we want to add a column that counts the number of amenities provided from that list. However, since the list of amenities is currently stored as a string, we need to do some pre-processing so that it is easier to count the number of amenities.

In [21]:
# Iterates over every row 
for index, row in df.iterrows():
  # Removes the extraneous characters from the amenities list
  row['amenities'] = row['amenities'].replace('[','').replace(']','').replace('"','')

# Converts the string of words into an actual list
df['amenities'] = df.amenities.apply(lambda x: x[1:-1].split(','))
# Adds a column that contains the length of the list of amenities
df['num_amenities'] = [len(amen_list) for amen_list in df['amenities']]

df.head() # Displays the first 5 listings
Out[21]:
id neighborhood latitude longitude room_type num_reviews rating amenities price_2020 price_2019 num_amenities
0 3781 East Boston 42.36413 -71.02991 Entire home/apt 17 99.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV"... 150 125 30
1 5506 Roxbury 42.32981 -71.09559 Entire home/apt 107 95.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV"... 145 145 30
2 6695 Roxbury 42.32994 -71.09351 Entire home/apt 115 96.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV"... 169 169 30
3 10730 Downtown 42.35840 -71.06185 Entire home/apt 32 96.0 ["Cable TV", "Smoke alarm", "TV", "Bed line... 81 150 30
4 10813 Back Bay 42.35061 -71.08787 Entire home/apt 10 99.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV"... 87 179 23

Dropping Missing Values

Lastly, we need to remove any rows with missing values to avoid errors later on.

In [22]:
df.dropna(inplace=True)

**Exploratory Data Analysis**

Now that we have finished with our Data Cleaning and Curation, we can move on to doing Exploratory Data Analysis. During this phase, we try to gain some more insight into the dataset through visualizations and also determine if there are any outliers that we need to account for. Primarily, we are trying to see whether room type, neighborhood, rating, number of amenities, or number of reviews could be used to predict price based on whether price varies based on those variables.

General Exploration

Plotting the Locations of Rentals

In order to get a better sense of the location of rentals, we produce a map with Marker Clusters. We can plot the rentals as clusters where we can zoom in on a specific cluster to see the individual marker locations. By clicking on a marker, a popup will appear that displays the URL to the listing, which can be typed into a browser to go to the listing's page. (Note, that it is possible for some of the listings to no longer be active, but most of them should be.)

In [23]:
import folium
from folium.plugins import MarkerCluster
from folium import Marker

location_map = folium.Map(location=[42.3, -71.057083], zoom_start=12, tiles='cartodbpositron') # Defines the base map

clusters = MarkerCluster() 

# Iterates over ever listing in our table
for index, row in df.iterrows():
  # Adds a marker for the listing at the corresponding coordinates and showing the URL when clicked
  popup = 'https://www.airbnb.com/rooms/' + str(row['id'])
  clusters.add_child(Marker([row['latitude'], row['longitude']], popup=popup))
location_map.add_child(clusters) 

location_map # Displays the map with the marker clusters
Out[23]:
Make this Notebook Trusted to load map: File -> Trust Notebook

From this, we can see that the largest clusters are near central Boston (where the word "BOSTON" is labeled on the map), which makes sense as that is likely the most populous area. As we move farther away from central Boston, we see that rentals are more sparse, particularly towards the south which has smaller clusters.

What Amenities are Most Common?

Since we have the data, let us see what are the most common amenities provided to guests.

In [24]:
from collections import Counter
import numpy as np
import plotly.express as px

# Defines an empty dictionary to keep track of the counts of for each amenity
all_amenities = {} 

# Iterates over every listing, adding to the counts for the respective amenities provided
for index, row in df.iterrows():
    all_amenities = Counter(all_amenities) + Counter(row['amenities'])

# Reads the amenities with the top 10 counts into a dataframe
most_amenities = all_amenities.most_common()[:10] 
amenities_df = pandas.DataFrame(most_amenities)
amenities_df.columns = ['Amenities', 'Frequency']

# Produces the bar plot of the top 10 amenities with their frequencies
amenities_df.sort_values(by='Frequency', inplace=True) # Sorts so that the most common amenity is at the top of the plot
fig = px.bar(amenities_df, x='Frequency', y='Amenities', orientation='h', title='10 Most Common Amenities Provided')
fig.show()

As we can see, the top 10 amenities include Wifi, Heating, Smoke alarm, and Carbon monoxide alarm. The "Essentials" amenities refer to basic items like toilet paper, soap, towel, and linens. Since our data contained around 2000 rentals and the frequencies of all these amenities are over 1000, then that means over half of the rentals from our data provide these amenities!

Exploring Price

Visualizing the Distribution of Prices

Now, the rest of our Exploratory Data Analysis will focus on the data in relation to rental prices, which is the focus of our project. First, we produce a box plot of the distribution of rental prices and check for major outliers.

In [25]:
import plotly.graph_objects as go

fig = go.Figure()
fig.add_trace(go.Box(y=df['price_2019'].values, name='2019')) # Box plot for 2019 prices
fig.add_trace(go.Box(y=df['price_2020'].values, name='2020')) # Box plot for 2020 prices

# Sets appropriate titles
fig.update_layout(showlegend=False, 
                  title='Distribution of Rental Price in 2019 and 2020',
                  xaxis_title='Year',
                  yaxis_title='Price ($)')

fig.show()

From these box plots, we can clearly see that there are major outliers where there are some rentals with very high prices, which is affecting the appearance of our plots. There are some outliers with prices of about \$4000 per night while the majority of the rentals appear to have prices less than $500 per night.

Excluding Outliers

Because there are such extreme outliers as shown by the above box plots, we will trim our dataset to try to remove the extreme outliers and reduce the impact they would have on our visualizations and our machine learning. Additionally, we do not expect the removal of these outliers to be too consequential, as these outliers are luxury properties with very high prices, and we are aiming to try to help people determine cheaper Airbnb's. Furthermore, if we did not remove these outliers, then our visualizations and predictive analysis would skew them to be much less meaningful.

To remove outliers, we use the common method that is based on the Interquartile Range (IQR). The IQR is the difference between the 75th (Q3) and 25th (Q1) percentile. Outliers are defined as values > Q3 + 1.5*IQR and values < Q1 - 1.5*IQR.

In [26]:
trim_df = df

# Removes outliers for the price columns for both years
for column in ['price_2019', 'price_2020']:
  # Calculates the IQR
  Q1 = trim_df[column].quantile(0.25)
  Q3 = trim_df[column].quantile(0.75)
  IQR = Q3 - Q1

  # Uses the IQR to remove outliers and update the dataframe
  trim_df = trim_df[trim_df[column] >= Q1 - IQR * 1.5]
  trim_df = trim_df[trim_df[column] <= Q3 + IQR * 1.5]

trim_df # Outputs the dataset without the major outliers
Out[26]:
id neighborhood latitude longitude room_type num_reviews rating amenities price_2020 price_2019 num_amenities
0 3781 East Boston 42.36413 -71.02991 Entire home/apt 17 99.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV"... 150 125 30
1 5506 Roxbury 42.32981 -71.09559 Entire home/apt 107 95.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV"... 145 145 30
2 6695 Roxbury 42.32994 -71.09351 Entire home/apt 115 96.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV"... 169 169 30
3 10730 Downtown 42.35840 -71.06185 Entire home/apt 32 96.0 ["Cable TV", "Smoke alarm", "TV", "Bed line... 81 150 30
4 10813 Back Bay 42.35061 -71.08787 Entire home/apt 10 99.0 ["Cable TV", "Shampoo", "Smoke alarm", "TV"... 87 179 23
... ... ... ... ... ... ... ... ... ... ... ...
2051 39444375 Back Bay 42.34670 -71.07945 Entire home/apt 1 100.0 ["Shampoo", "Smoke alarm", "TV", "Private e... 130 200 22
2052 39444706 Back Bay 42.34536 -71.07852 Entire home/apt 7 100.0 ["Garden or backyard", "Cable TV", "Shampoo"... 148 200 36
2053 39445807 Back Bay 42.34645 -71.07803 Entire home/apt 2 100.0 ["Shampoo", "Smoke alarm", "TV", "Bed linen... 125 200 29
2054 39446774 Back Bay 42.34663 -71.07915 Entire home/apt 1 100.0 ["Shower gel", "Cable TV", "Shampoo", "Smok... 148 245 33
2057 39447565 Back Bay 42.34834 -71.08152 Entire home/apt 1 100.0 ["Shampoo", "Smoke alarm", "TV", "Baking sh... 148 245 29

1692 rows × 11 columns

After removing the outliers, we now have 1692 listings to analyze. Going forward, we will be using this data without outliers for our visualizations and machine learning. We now produce our box plots of rental price again.

In [27]:
fig = go.Figure()
fig.add_trace(go.Box(y=trim_df['price_2019'].values, name='2019')) # Box plot for 2019 prices
fig.add_trace(go.Box(y=trim_df['price_2020'].values, name='2020')) # Box plot for 2020 prices

# Sets appropriate titles
fig.update_layout(showlegend=False, 
                  title='Distribution of Rental Price in 2019 and 2020',
                  xaxis_title='Year',
                  yaxis_title='Price ($)')

fig.show()

Now our box plots look much better because we have removed the extreme outliers. From these plots, it appears that the median rental price in 2020 is less than 2019 since it was \$125 in 2019 and \\$99 in 2020. Additionally, the range of prices in 2020 is slightly smaller.

Does Price Vary Depending on the Neighborhood?

Now, we want to see whether price varies across the different neighborhoods of Boston. To do so, we produce a choropleth map, which will color each neighborhood according to the average rental price for rentals in that neighborhood. Darker colors correspond to a higher average rental price, while lighter colors mean a lower average rental price. By using a choropleth map, we will easily be able to visualize the average price for each neighborhood in Boston and how the averages compare to one another. Note that hovering over each region to will display the name of the neighborhood!

In [30]:
# Defines a function that draws the choropleth map for the specified year
def choropleth_map(year):
  # Calculates the average rental price for each neighborhood
  means = trim_df.groupby('neighborhood')['price_' + year].mean()

  # Defines the base map
  price_map = folium.Map(location=[42.3, -71.057083], tiles='cartodbpositron', zoom_start=11)

  # Defines the choropleth map's properties
  choropleth = folium.Choropleth(
      geo_data = county_geo, 
      data = means,
      key_on = 'feature.properties.neighbourhood',
      fill_color ='YlGnBu',
      fill_opacity = 0.7,
      line_opacity = 0.2,
      legend_name='Price ($)').add_to(price_map)

  choropleth.geojson.add_child(folium.features.GeoJsonTooltip(['neighbourhood'],labels=False))

  # Sets the title for the map
  text = 'Choropleth Map of Airbnb Prices in Boston (' + year + ')'
  title_html = '''<p align="center" style="font-size:18px">{}</p>'''.format(text)   
  price_map.get_root().html.add_child(folium.Element(title_html))

  display(price_map) # Displays the map
In [31]:
choropleth_map('2019') # Displays the choropleth map for 2019
Make this Notebook Trusted to load map: File -> Trust Notebook

In 2019, the most expensive neighborhoods were North End, Downtown, West End, Chinatown, Back Bay, and Fenway. We can see that the more expensive neighborhoods appear to be those near central Boston, with the less expensive districts being those like Hyde Park or Brighton which are closer to the outskirts. Note that the Harbor Islands neighborhood is colored black because there were no rentals with that location, which makes sense because they are quite small and not quite part of the city.

We now produce the choropleth map of Airbnb prices in 2020.

In [32]:
choropleth_map('2020') # Displays the choropleth map for 2020
Make this Notebook Trusted to load map: File -> Trust Notebook

In 2020, the Leather District and West End were the most expensive, followed by North End, Chinatown, Fenway, Charleston, and South Boston Waterfront. Again, the trend of central Boston (meaning the area around Downtown) being more expensive on average remains the same.

In all, from these choropleth maps, we can clearly see that the price of Airbnb rentals does vary by neighborhood since the neighborhoods are colored differently according to the average price of rentals located there. Overall, the most expensive neighborhoods on average tend to be the ones that are closest to central Boston, with the less expensive neighborhoods being the ones further away. However, we can see that the most expensive neighborhoods on average does vary a bit between 2019 and 2020, suggesting that the distribution of prices may have changed from 2019 to 2020.

Does Price Vary Depending on Room Type?

We now want to explore whether the type of room of the rental affects price. To do so, for each type of room, we produce a box plot of the prices for listings that are of that room type.

In [33]:
fig = go.Figure()

# Makes box plots for 2019 and then 2020
fig.add_trace(go.Box(y=trim_df['price_2019'], x=trim_df['room_type'], boxpoints=False, name='2019'))
fig.add_trace(go.Box(y=trim_df['price_2020'], x=trim_df['room_type'], boxpoints=False, name='2020'))

# Sets appropriate titles
fig.update_layout(boxmode='group',
                  title='Distribution of Rental Price for Different Room Types',
                  xaxis_title='Room Type',
                  yaxis_title='Price ($)')

fig.show()

From this, we can see that the type of room of the listing does appear to affect the price. The distributions for each room type across 2019 and 2020 appear to be largely the same. Shared rooms appear to be the cheapest and hotel rooms appear to be the most expensive on average, which is what we would expect. There are some high outliers for entire home/apartment and private room listings, which is understandable considering there may be luxury homes or private rooms, but it is less likely for there to be something like a luxury room that is shared.

Does Price Vary Based on Other Variables?

Lastly, let's see if price of a rental varies based on the other variables like the number of reviews, number of amenities, or rating. To do so, we will do a simple scatterplot of price against of each of the variables.

In [34]:
from plotly.subplots import make_subplots

# Defines that we want a row of 3 subplots
fig = make_subplots(rows=1, cols=3, horizontal_spacing=0.1,
                    subplot_titles=("Price vs. # of Reviews", "Price vs. # of Amenities", "Price vs. Rating"))

# Plots price vs. # of reviews
fig.add_trace(go.Scatter(x=trim_df['num_reviews'], y=trim_df['price_2020'], mode='markers'), 
              row=1, col=1)

# Plots price vs. # of amenities
fig.add_trace(go.Scatter(x=trim_df['num_amenities'], y=trim_df['price_2020'], mode='markers'), 
              row=1, col=2)

# Plots price vs. rating
fig.add_trace(go.Scatter(x=trim_df['rating'], y=trim_df['price_2020'], mode='markers'), 
              row=1, col=3)

# Sets appropriate titles
fig.update_layout(showlegend=False)
fig.update_yaxes(title_text="Price ($)", row=1, col=1)
fig.update_xaxes(title_text="# of Reviews", row=1, col=1)
fig.update_xaxes(title_text="# of Amenities", row=1, col=2)
fig.update_xaxes(title_text="Rating", row=1, col=3)

From these plots, we can see that the majority of rentals appear to have less than 200 reviews, less than 40 amenities, and ratings greater than 80. However, there does not seem to be any clear trends between price and the number of reviews, number of amenities, or rating. For a certain number of reviews, number of amenities, or rating, there are rentals at a wide range of prices. Nevertheless, we will include these variables in our predictive analysis to see more closely whether there may actually be a relationship, even if it is slight.

**Hypothesis Testing and Machine Learning**

Now that we have done our Exploratory Data Analysis, we will try to do some hypothesis testing and machine learning to more concretely answer whether prices have changed significantly since 2019 and how well we can predict prices of Airbnb rentals.

Are the prices in 2020 significantly different from 2019?

Let's determine if the factors we talked about show a statisically significant difference between prices pre and post pandeemic through a paired t-test of the 2019 prices and 2020 prices. In a paired t-test, each subject is measured twice to determine if the mean difference between the two sets of measurements are 0.

The null hypothesis and alternative hypothesis that we will be testing are as follows:

$ H_{\theta} $ = The mean difference between the 2019 and 2020 prices are 0.
$ H_{a} $ = The mean difference between the 2019 and 2020 prices are 0.

If we can reject the null hypothesis from the results of the t-test, then we can say that the prices in 2020 for Airbnb rentals are significantly different from the prices in 2019.

In [35]:
from scipy import stats

# Performs the t-test and prints the result of the test
result = stats.ttest_rel(trim_df['price_2020'], trim_df['price_2019'])
print('Test Result: \n' + 
      't-statistic= ' + str(np.round(result.statistic, decimals=3)) + '\n' +
      'p-value= ' + str(result.pvalue) + ' ≈ ' + str(np.round(result.pvalue, decimals=3)) + '\n')

# Prints the mean prices in 2020 and 2019 for comparison
print("Mean 2019 Price ($): " + str(np.round(trim_df['price_2019'].mean(), decimals=2)))
print("Mean 2020 Price ($): " + str(np.round(trim_df['price_2020'].mean(), decimals=2)))
Test Result: 
t-statistic= -12.708
p-value= 2.1061464045654373e-35 ≈ 0.0

Mean 2019 Price ($): 131.15
Mean 2020 Price ($): 115.44

The p value ($ 2.106 * 10^{-35}$) resulting from the t-test is extemely close to 0, meaning that we can reject the null hypothesis. Accordingly, we have sufficient evidence to conclude that the Boston Airbnb prices in 2020 are significantly different from the prices in 2019. Additionally, as we can see, the mean price of Airbnb rentals in 2019 was \$131 per night, which is greater than the mean price in 2020 of \\$115 per night, suggesting that rental prices in 2020 were cheaper on average.

How Well Can We Predict Prices?

Now we will use machine learning models to try to predict Airbnb rental price based on a variety of variables. We will fit a linear regression model to try to predict prices in 2019 and 2020 based on the variables we have recorded for each rental. For our linear regression models, we will be using Ordinary Least Squares (OLS) which basically means that it will try to minimize the sum of the squared differences between the actual value and the predicted value by a model. First, we'll compare the 2019 and 2020 models and then look more closely at the 2020 model to discuss the takeaways.

Basic Model

Comparing Regression Models from 2019 and 2020

Using the Statsmodel package's built in function for OLS linear regression models, we will produce a mulitple linear regression model that attempts to predict price based on the rental's neighborhood, room type, number of reviews, rating, and number of amenities. After computing these models to predict price in 2019 and 2020, we will output the $R^{2}$ value and the p-value result of the F-test of overall significance, which are computed by Statsmodel for us.

The $R^{2}$ value is the percentage of variation in price that can be explained by the predictor variables. In other words, it is a "goodness-of-fit" measure for linear regression models, meaning that it indicates the strength of our linear model in predicting price, with higher values indicating a stronger relationship.

The F-test of overall significance tests whether our model using the independent variables of neighborhood, room type, number of reviews, rating, and number of amenities, is significantly better than a model that does not use any independent variables.

In [36]:
from statsmodels.formula.api import ols
import numpy as np

# Defines that we want a multiple linear regression model to predict 2019 price based on the listed variables
model_2019 = ols('price_2019 ~ neighborhood + room_type + num_reviews + rating + num_amenities', data=trim_df)
model_2019 = model_2019.fit() # Fits the model 

# Repeats for predicting 2020 price
model_2020 = ols('price_2020 ~ neighborhood + room_type + num_reviews + rating + num_amenities', data=trim_df)
model_2020 = model_2020.fit()

# Prints out the computed R squared value and f-test p value of both modles
print('2019 model:\n' + 
      'R-squared value: ' + str(np.round(model_2019.rsquared, decimals=3)) + '\n'
      'F-test p-value: ' + str(model_2019.f_pvalue) + ' ≈ ' + str(np.round(model_2019.f_pvalue, decimals=3)) + '\n')

print('2020 model:\n' + 
      'R-squared value: ' + str(np.round(model_2020.rsquared, decimals=3)) + '\n'
      'F-test p-value: ' + str(model_2020.f_pvalue) + ' ≈ ' + str(np.round(model_2020.f_pvalue, decimals=3)))
2019 model:
R-squared value: 0.539
F-test p-value: 1.5705993210554638e-253 ≈ 0.0

2020 model:
R-squared value: 0.385
F-test p-value: 5.208932825861506e-152 ≈ 0.0

The $R^{2}$ values for our 2019 and 2020 linear regression models indicate that the models predict the price moderately well. More notably, the R-squared value for the 2019 model (0.539) is greater than the value for the 2020 model (0.385), which indicates that the linear regression model for predicting prices in 2019 performs better than the model for predicting prices in 2020. We will discuss the possible implications of this further in the Conclusion section.

The p-value resulting from the F-test for both models is extremely close to 0, which provides further support that our models predicting price based on neighborhood, room type, number of reviews, rating, and number of amenities is significantly better than a model that does not predict based on any independent variables. In other words, it means that using these independent variables in our models had a signficant improvement the in the model's ability to predict prices, which makes sense considering we saw above in in our EDA phase that price appears to vary with neighborhood and room type.

Takeaways from 2020 Model for Predicting Prices

Now, we will specifically look at the coefficients for the linear regression model predicting prices in 2020 to see if we can gain any insight into how to get a cheaper Airbnb in 2020.

In [37]:
# Prints out the rounded coefficients for each variable of the 2020 model in sorted order
model_2020.params.sort_values().round(decimals=2)
Out[37]:
room_type[T.Shared room]                  -102.29
room_type[T.Private room]                  -56.04
neighborhood[T.Hyde Park]                  -20.43
neighborhood[T.Longwood Medical Area]       -9.84
neighborhood[T.Mattapan]                    -6.10
neighborhood[T.Brighton]                    -1.32
num_reviews                                 -0.06
rating                                       0.94
num_amenities                                1.32
Intercept                                    2.13
neighborhood[T.Roxbury]                      3.77
neighborhood[T.Roslindale]                   6.09
neighborhood[T.West Roxbury]                 6.23
neighborhood[T.Dorchester]                  13.05
neighborhood[T.Downtown]                    13.82
neighborhood[T.Jamaica Plain]               19.13
neighborhood[T.Beacon Hill]                 22.27
neighborhood[T.East Boston]                 24.71
neighborhood[T.South End]                   25.99
neighborhood[T.Back Bay]                    27.52
neighborhood[T.Mission Hill]                27.87
neighborhood[T.South Boston]                34.64
neighborhood[T.Bay Village]                 37.37
neighborhood[T.Chinatown]                   44.16
neighborhood[T.North End]                   51.16
neighborhood[T.Charlestown]                 51.54
neighborhood[T.Fenway]                      53.24
neighborhood[T.Leather District]            54.88
neighborhood[T.West End]                    58.15
neighborhood[T.South Boston Waterfront]     64.40
room_type[T.Hotel room]                     68.67
dtype: float64

From this list of coefficients, we can see that variables associated with negative coefficients decrease the predicted price of the rental while variables associated with positive coefficients increase the predicted price.

When looking at the type of room, we see that shared rooms are least expensive followed by private rooms, since they have the most negative coefficients, which makes sense based on the box plots we did showing how price varies depending on room type. On the other hand, hotel rooms are most expensive, which is also reasonable based on the box plots and intuition.

When looking at the neighborhood, South Boston Waterfront, West End, and Leather District appear to be the three most expensive while Hyde Park, Longwood Medical Area, and Mattapan appear to be the three least expensive neighborhoods.

The coefficient for the number of reviews is quite close to 0, indicating that it does not really have that much of an effect on the price.

The coefficient for the rating variable is close to 1, meaning that a rental with a perfect rating is expected to be approximately $100 more expensive to rent than a rental with a 0 rating. Of course, this is a very extreme example and the coefficient is relatively small, so rating does not have too much of an impact either.

Lastly, the coefficient for number of amenities is also close to 1, meaning that for each amenity provided, the predicted rental price increases by about \$1.

In [38]:
model_2020.summary()
Out[38]:
OLS Regression Results
Dep. Variable: price_2020 R-squared: 0.385
Model: OLS Adj. R-squared: 0.374
Method: Least Squares F-statistic: 34.70
Date: Sun, 20 Dec 2020 Prob (F-statistic): 5.21e-152
Time: 01:48:16 Log-Likelihood: -9027.8
No. Observations: 1692 AIC: 1.812e+04
Df Residuals: 1661 BIC: 1.829e+04
Df Model: 30
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 2.1291 18.099 0.118 0.906 -33.371 37.629
neighborhood[T.Back Bay] 27.5163 7.615 3.614 0.000 12.581 42.452
neighborhood[T.Bay Village] 37.3705 11.533 3.240 0.001 14.750 59.991
neighborhood[T.Beacon Hill] 22.2723 7.574 2.941 0.003 7.417 37.128
neighborhood[T.Brighton] -1.3180 7.051 -0.187 0.852 -15.149 12.513
neighborhood[T.Charlestown] 51.5437 9.207 5.598 0.000 33.484 69.603
neighborhood[T.Chinatown] 44.1577 16.129 2.738 0.006 12.522 75.794
neighborhood[T.Dorchester] 13.0525 5.998 2.176 0.030 1.287 24.818
neighborhood[T.Downtown] 13.8168 7.179 1.925 0.054 -0.265 27.898
neighborhood[T.East Boston] 24.7075 7.144 3.458 0.001 10.695 38.720
neighborhood[T.Fenway] 53.2351 8.691 6.126 0.000 36.190 70.281
neighborhood[T.Hyde Park] -20.4330 10.023 -2.039 0.042 -40.091 -0.775
neighborhood[T.Jamaica Plain] 19.1311 6.552 2.920 0.004 6.280 31.982
neighborhood[T.Leather District] 54.8785 50.976 1.077 0.282 -45.106 154.863
neighborhood[T.Longwood Medical Area] -9.8361 36.202 -0.272 0.786 -80.842 61.170
neighborhood[T.Mattapan] -6.0983 10.440 -0.584 0.559 -26.574 14.378
neighborhood[T.Mission Hill] 27.8666 10.265 2.715 0.007 7.732 48.001
neighborhood[T.North End] 51.1574 9.852 5.193 0.000 31.834 70.481
neighborhood[T.Roslindale] 6.0916 8.562 0.711 0.477 -10.702 22.885
neighborhood[T.Roxbury] 3.7661 6.502 0.579 0.563 -8.988 16.520
neighborhood[T.South Boston] 34.6359 7.453 4.648 0.000 20.019 49.253
neighborhood[T.South Boston Waterfront] 64.3980 19.859 3.243 0.001 25.446 103.350
neighborhood[T.South End] 25.9874 6.992 3.717 0.000 12.273 39.702
neighborhood[T.West End] 58.1533 13.648 4.261 0.000 31.384 84.923
neighborhood[T.West Roxbury] 6.2262 11.517 0.541 0.589 -16.364 28.816
room_type[T.Hotel room] 68.6711 12.257 5.603 0.000 44.630 92.712
room_type[T.Private room] -56.0406 2.918 -19.203 0.000 -61.765 -50.316
room_type[T.Shared room] -102.2883 19.421 -5.267 0.000 -140.380 -64.197
num_reviews -0.0603 0.016 -3.693 0.000 -0.092 -0.028
rating 0.9422 0.182 5.176 0.000 0.585 1.299
num_amenities 1.3247 0.164 8.095 0.000 1.004 1.646
Omnibus: 304.363 Durbin-Watson: 1.796
Prob(Omnibus): 0.000 Jarque-Bera (JB): 547.118
Skew: 1.116 Prob(JB): 1.57e-119
Kurtosis: 4.668 Cond. No. 5.32e+03


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.32e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Can We Improve Our Model?

[explanation]

Comparing Regression Models from 2019 and 2020

In [39]:
from statsmodels.formula.api import ols
import numpy as np

# Defines that we want a multiple linear regression model to predict 2019 price based on the listed variables
model_2019 = ols('price_2019 ~ neighborhood*room_type*num_reviews*rating*num_amenities', data=trim_df)
model_2019 = model_2019.fit() # Fits the model 

# Repeats for predicting 2020 price
model_2020 = ols('price_2020 ~ neighborhood*room_type*num_reviews*rating*num_amenities', data=trim_df)
model_2020 = model_2020.fit()

# Prints out the computed R squared value and f-test p value of both modles
print('2019 model:\n' + 
      'R-squared value: ' + str(np.round(model_2019.rsquared, decimals=3)) + '\n'
      'F-test p-value: ' + str(model_2019.f_pvalue) + ' ≈ ' + str(np.round(model_2019.f_pvalue, decimals=3)) + '\n')

print('2020 model:\n' + 
      'R-squared value: ' + str(np.round(model_2020.rsquared, decimals=3)) + '\n'
      'F-test p-value: ' + str(model_2020.f_pvalue) + ' ≈ ' + str(np.round(model_2020.f_pvalue, decimals=3)))
2019 model:
R-squared value: 0.668
F-test p-value: 5.255463015619756e-162 ≈ 0.0

2020 model:
R-squared value: 0.572
F-test p-value: 1.0085380063016781e-100 ≈ 0.0

**Conclusion**

In conclusion, we found that prices for Boston Airbnb rentals were cheaper on average in 2020 compared to 2019. We also found that we can predict the prices of Boston Airbnb rentals moderately well with multiple linear regression models based on a rental's neighborhood, room type, number of reviews, number of amenities, and rating. The model for predicting prices in 2019 was better than the model for predicting prices in 2020. This makes sense considering that with the pandemic, 2020 is a year full of uncertainty and variability that we could not account for in our model. This decreased predictive power in 2020 and overall decrease in prices may be due to a variety of reasons. For example, people on average may be lowering rental prices to try to attract more guests to account for reduced travel, but there may also be some hosts who are trying to raise prices to account for greater cleaning costs or lack of revenue. In the future, more work could be done in trying to improve the model by trying other regression methods or taking into account some of the other variables in the original dataset like rental availability, but it is of course not going to be easy or even possible to predict Airbnb prices perfectly!

Anyhow, if you are looking to stay in an Airbnb in Boston during this time and looking for a cheaper option, we recommend looking for a shared or private room located in Hyde Park or Longwood Medical Area. If you want to be located closer to the center of Boston, then Downtown is probably your best bet! Avoid looking for rentals in South Boston Waterfront, West End, and Leather District, which tend to be significantly more expensive. While other variables like the number of reviews, rating, and number of amenities do not have too much of a significant impact, expect a higher rated rental with lots of amenities to be more expensive.